Presentation Group 5

Emmanuel, Adomas, Subhayan and Chuang

Introduction

  • Hidradenitis suppurativa (HS) - inflammatory skin disease linked to immune dysregulation and abnormalities in follicular structure and function.

  • Data - Lesional and non-lesional skin samples from 20 patients

  • Study objective - Find an expression-based disease signature of HS by observing affected and unaffected skin samples.

  • Methods - Differential expression and pathway enrichment analyses.

The dataset used for the study isn’t particularly diverse - for instance, only 3 males were sampled, all of whom were either active or ex-smokers.

Trace the Flow of the Project

Load

We have two different files from the Gene Expression Omnibus (GEO), one containing the metadata and the other the raw gene counts from the RNAseq.

# Tidy columns (with no prefixes)
meta_table <- meta_table |>
  mutate(
    Tissue = gsub("tissue: ", "", Tissue),
    Tissue_type = gsub("tissue_type: ", "", Tissue_type),
    Patient_id = gsub("patient_id: ", "", Patient_id),
    Age = gsub("age: ", "", Age),
    Gender = gsub("gender: ", "", Gender),
    Race = gsub("race: ", "", Race),
    Smoker= gsub("smoker: ", "", Smoker),
    BMI= gsub("bmi: ", "", BMI),
    Sample = str_split_i(Sample, pattern="_", 2),
  )

How we do it

# We join on patient ID
final_table <- meta_table |>
  left_join(
    gene_table,
    by = "Sample"
  )

Data Cleaning

  • Start with 58051 genes in the pool
    1. Remove gene if 0 expression for all patients (-6098)
    2. Remove gene if expression linear model coefficient not significant between diseased/healthy tissue (adj.p>0.05) (-51437)
    3. Remove gene if expression change for all patients < 3-fold between the diseased/healthy tissue groups ( -29)
  • 487 genes advance to analysis rounds

Modeling analysis

Linear regression on a binary value (perilesion or lesion) we want to see which gene expression varies the most.

Many collagens in the most upregulated genes

Volcano plot

We can see on a volcano plot the correlation between expression change and significance

A small relative change in expression implies that the gene is not significant (Bonferonni correction)

Volcano plot

We can see on a volcano plot the correlation between expression change and significance

A small relative change in expression implies that the gene is not significant (Bonferonni correction)

Volcano plot

We can see on a volcano plot the correlation between expression change and significance

A small relative change in expression implies that the gene is not significant (Bonferonni correction)

Annotation

We want to retrieve the name of the genes we are analyzing (instead of the Ensemble names)

We use the database org.HS.eg.db with the package AnnotationDbi (to manipulate the database)

# Find the corresponding gene SYMBOL for all ENSEMBL in org.HS.eg.db 
mapping <- AnnotationDbi::select(org.Hs.eg.db, 
                                 keys = keys(org.Hs.eg.db, keytype = "ENSEMBL"),
                                 keytype = "ENSEMBL",
                                 columns = c("SYMBOL", "GENENAME", "CHR")
                                ) |> 
                  as_data_frame()

# Use left_join from tidyverse
annotated_data <- expression_data |> 
  left_join(mapping, by = c('gene' = 'ENSEMBL'))

no_annotation_genes <- annotated_data |> 
  filter(is.na(SYMBOL)) 

# Antijoin keep all element that is not in the no_annotation_genes
annotated_data <- annotated_data |> 
  anti_join(no_annotation_genes, 
            by = 'gene')

# Genes with multiple annotation would have more than 20 entrees
duplicates <- annotated_data |> 
  group_by(gene) |> 
  filter(n() > 20) 

mult_annotated_genes <- duplicates |>
  dplyr::select(gene, SYMBOL, GENENAME) |> 
  distinct() |> 
  ungroup() 

annotated_data <- annotated_data |> 
  anti_join(mult_annotated_genes, by = 'gene') 

Inconsistencies in the number of lesional and non-lesional reads per sample.

58051 genes in total, 1855264316 reads

450 significant genes, 84523661 reads

Some samples have more non-lesional reads, while some have more lesional reads.

Principal Component Analysis

Score Plot

Variance Explained

Certain chromosomes have more genes involved in immune activity

Interesting - No downregulated genes in the Y chromosome

Most Upregulated and Downregulated Genes

Another way to look at it

Functional Assessment of Upregulated and Downregulated Genes

  • Upregulated genes:
    • NTRK1: Neuronal survival and Signal Transduction\n(MAPK, PI3K-Akt, PLC-\(\gamma\))
    • POU2AF1: B-Cell immune responds
    • SIGLEC7: Inhibition of NK-Cell responds
  • Downregulated genes:
    • KRT77: Kreatin, Intermediate filament protein
    • CHRM1: Muscarinic receptor
    • PHYHIP: Fatty acid metabolism and perioxisomal function

Gene Ontology Analysis

Enhanced Activation of Inflammatory, Innate and Adaptive immune pathway

Reduced Keratinization and Intermediate filament activity

Discussion

  • Observations by Freudenberg et al.
    • Gene Upregulation
      • MMP1, C-X-C, SERPINB4, and S100.
    • Enhanced Immune Responses
      • Upregulation in both adaptive and innate immune system functions.
    • Reduced Keratinization and Intermediate Filament Activity
  • Potential Causes for Variance in Results
    • Model Selection
      • lm vs glm, Empirical Bayes
    • Adjusted P-value Cutoff Choices
    • Gene Selection Criteria
      • Focus only on protein-encoding genes